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QCD with dynamical Wilson fermions — first results from SESAM* 
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First results of a recently started simulation of full QCD with two flavours of sea-quarks at a coupling of /3 = 5.6 
on a 16^ X 32 lattice are presented. Emphasis is laid on the statistical significance that can be achieved by an 
"integrated luminosity" of 140 TFlopxhrs, for Hybrid Monte Carlo simulations at four intermediate values of 
The simulation takes place on the Quadrics QH2 at DESY/Zeuthen and DFG/Bielefeld. The performance 
is optimized by means of BiCGStab and the chronological inversion method of Brower et al. We discuss the 
systematic errors arising from lack of the molecular dynamic's reversibility on the 32-bit QH2. For plaquette 
and meson correlators wc find integrated autocorrelation times of < 20 units of molecular dynamics time and 
exponential autocorrelation times of about 50 units. Using these results we perform preliminary measurements 
of the central potential and n and p correlators on independent configurations and obtain first estimates of the 
lattice spacings at three values of the dynamical hopping parameter. 



1. INTRODUCTION 

There is a growing need for progress in the 
treatment of full QCD with Wilson fermions at 
zero temperature as quenched simulations are be- 
coming more and more precise and we are ap- 
proaching the point where the effects of dynami- 
cal fermions are expected to become visible. 

It is well known from the pioneering large 
scale computer experiments on Connection Ma- 
chines[l,2], however, that dynamical Wilson 
fermions are particularly difficult to simulate, and 
it remained unsettled at the time whether the 
standard Hybrid Monte Carlo (HMC) algorithm 
is suited at all to provide us with a sufScient sam- 
ple of decorrelated configurations at sufSciently 
weak coupling[3]. With the advent of powerful 
dedicated computers like APEIOO, this issue has 
been readdressed, and an autocorrelation time in 
the HMC series has been quoted (from a sample 
of 500 trajectories) to be of order 50[4]. 

Nevertheless, for a physics analysis in full QCD, 
one still lacks a reliable determination of the au- 
tocorrelation time T and a sufficiently large sam- 
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pie of decorrelated field configurations. Such 
a program requires for sure the generation of 
(9(100 X r) trajectories. To be specific: one would 
need envisage a computer experiment with order 
20.000 trajectories, to reach significant results. 

Apart from the physics motivation, such an 
experiment is highly desirable as reference point 
to assess the benefit from 'improved' algorithms 
which are currently being studied[5]. 

In this spirit we have formed a collaboration 
with the magic name 'SESAM' which is meant 
to unlock the door — through a statistically sig- 
nificant sample — to behold Sea Quark Effects on 
Spectrum And Matrix Elements. Our magic key 
carries the name of QH2 which delivers cheap real 
Gigaflops, as installed by DESY in Zeuthen and 
DFG in Bielefeld. 

2. EXPERIMENTAL PROCEDURES 

Wc employ the standard Hybrid Monte Carlo 
algorithm[6,7] (HMC) with n/ = 2 Wilson 
fermions on a lattice of size 16^ x 32, with 
fermionic boundary conditions anti-periodic in 
time. The results of the SCRTgroup on the full- 
QCD /3-function [3] suggest to go beyond /3 = 5.5 
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in order to stay away from the strong coupling 
regime. To remain with a reasonable physical 
lattice size, we chose to work at /3 = 5.6. With 
algorithmic improvements and a fast implemen- 
tation of the HMC on the QH2 (to be described 
below), we achieve a sustained HMC-performance 
of nearly 8 Gflops at a QH2 peak-rate of 12.8 
Gflops. 

Given these parameters and the performance 
data for the inversion part of the HMC quoted in 
Refs. [1,2,8], we could estimate, during the lay- 
out stage of the computer experiment, the num- 
ber of trajectories that would be achieved on 
QH2, given a years running time, as a function 
of We have normalized the trajectory length 
to T = 1, extrapolated hnearly to a residue of 



\\M^Mx-<l)\\ 



10- 



(see section 4) and — 
using the scaling laws of Ref. [1] — further ex- 
trapolated to our lattice volume ofV = 16^ x 32. 

This results in the performance plot of Fig. 1. 
After 100 days of running time, it is gratifying 
to find ourselves (SESAM) in accord with these 
expectations. 

In the tuning of k towards its chiral regime, we 

should aim at smah values of while ensur- 
ing ' 

ing finite-size effects to remain tolerable, to say 

m^L > 4. This requirement endows the available 

— range with a lower limit which marks the 'chi- 
mp o 

ral target region'. In Fig. 1 the latter is indicated 
by the vertical lines. 

During the tuning phase, we approached 
the 'target region' in an iterative procedure. 
Eventually we decided to generate a mass 
trajectory of 4 dynamical K-values, k = 
0.1560,0.1565,0.1570,0.1575. In the first hun- 
dred days of dedicated run-time, we have pro- 
duced around 5000 trajectories at three different 
values of the mass of the sea-quark, see Table 1. 

3. ACCELERATING HMC 

Lattice QCD can profit considerably from the 
development of new types of iterative solvers like 
BiCGStab[9] or QMR[10] as has recently been 
shown in Refs. [11-14]. 

In the context of HMC, we apply BiCGStab 
in a two step procedure as is the case with MR. 
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Figure 1. Performance plot: number of configu- 
rations achievable in one year of QH2 run-time, 
plotted against as estimated from preceding 
computer experiments. The plot also shows, on 
the right hand scale, the actual numbers of tra- 
jectories achieved by Refs. [1,2,8]. 



Repeatedly, during the molecular dynamics evo- 
lution, the equation M^MgX = 6 has to be solved, 
with Mg being the even-odd preconditioned Wil- 
son fermion matrix Mg = 1 — K^DgoDgg, where 
the Wilson fermion matrix is given as 



M 



1 



KDge 
1 



The solution proceeds in two-step mode: 



first: 



ly = (f>, second: MgX = y, 



(1) 



(2) 



where both equations are solved with equal accu- 
racy, see below. It has been shown in Ref. [11] 
that the resulting solution vectors x are equal 
within round-off errors compared to a direct CG 
solution of M^M^x = (j). 

The multiplication by the matrix Mg requires 
28 milli-seconds on the QH2 for the 16^ x 32 lat- 
tice. At K = 0.1575 we need about 40 minutes 
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to run one trajectory of 100 molecular dynamics 
steps. 

Fig. 2 plots the accumulated residual r as func- 
tion of the cpu-timc, computed on the QH2 for a 
typical thermalized configuration at our smallest 
bare quark mass. We found that overrelaxed MR 
outperforms CG, yet BiCGStab beats ORMR. 
This is shown in the figure and is accordance with 
Ref. [11]. The gain is typically 30 % compared to 



where 



(4) 



In second order, this procedure had been sug- 
gested before in Ref. [7]. 

In Fig. 3, wc plot the number of iterations along 
the molecular dynamics evolution. We have em- 
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Figure 2. Accumulated residual r as function 
of the cpu-time in the two step inversion of 
M^MgX = (j). 

the performance of BiCGStab. We note that the 
true residual r runs into a plateau at r = 10^^ 
as the matrix multiplication is carried out in sin- 
gle precision. Note that global operations inside 
the inverter are emulated in Kahan precision[15], 
however. 

A method to further improve any inverter's 
performance has been presented in Ref. [16]: as- 
suming a smooth behaviour in the molecular dy- 
namics evolution of the solution vector x for small 
molecular dynamics time steps dt, one can guess 
the vector Xn+i at time step n + 1 from the pre- 
ceeding vectors a;„, x„_i . . .Xi. We use the simple 
polynomial formula as given in Ref. [16]: 

n 

— ^^CiXi, (3) 



Figure 3. Extrapolation of the start vector x to 
8th order in dt at a K-value of 0.1570. 



ployed an extrapolation up to 8th order to deter- 
mine the starting vector. Again plotting a typical 
situation, we see that nearly a factor of 2 can be 
gained in the number of iterations after the start- 
up phase. We note that there are a few particular 
trajectories that cannot be accelerated using this 
method. For k = 0.1575, we go to 11th order. 

4. REVERSIBILITY 

It has been pointed out recently[17] that HMC 
carries a positive Lyapunov exponent. Therefore 
one has to worry that the ensueing irreversibility 
docs not spoil detailed balance in the HMC pro- 
cess on the 32-bit Quadrics machine. It is impor- 
tant to make sure that the Lyapunov amplitude 
remains small along a trajectory. 

A measure for this is the norm of the mis- 
match[17], between an initial and a final gauge 
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configuration {Ui} and {Uf} 'around a closed 
loop': 



\dU\ 



V 



(5) 



{Uf} is the result of a molecular dynamics inte- 
gration, where, after Nmd steps (turning-point), 
the momenta and the time direction are reversed, 
p —p, dt — >■ —dt, such that after another N^d 
steps we should arrive at Uf = Ui — given the nu- 
merical procedure to be exact. Numerical errors, 
however, can cither be amplified or suppressed by 
the dynamics and — if the asymptotics of \ \dU\\ is 
defined as 



(6) 



— either a positive of negative exponent v will be 
found. 

In Fig. 4, we have drawn the norm \\dU\\ as 
function of the number of molecular dynamics 
steps Nmd at the turning-point. On a thermal- 



the above mismatch on A5. The violation of de- 
tailed balance is expressed by the factor Z, 

W{U)PiU ^U') =WiU')PiU' ^U) ■ Z, (7) 

in terms of the fixed point distribution W{U) and 
the evolution probability P, where 



Z = exp(<5(AS')). 



(8) 



(5(AS') is the (systematic) error in AS and a mon- 
itor for the violation of detailed balance. 

In order to acquire a feeling for the relevance of 
this error and its dependency on the precision of 
the inversion on the QH2, we computed AS along 
a closed trajectory loop, integrating forward and 
then backwards in time, with A^^^ = 100 for 
1G~^ > r > 10~* on one thermalized 16^ x 32 
lattice. During this operation, the global ac- 
tion is carefully tracked in double precision arith- 
metic[15]. The resulting mismatch value in AS 
that we denote by S{AS) is plotted vs. r in Fig. 5. 
Note that this calculation is done with an 11th or- 
der guess as described above. While AS ~ 1, we 
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Figure 4. \\dU\ \ as function of N^d at the turning- 
point, r = 10~*. 



Figure 5. (5(A5) as function of r. 



ized 16^ X 32 lattice for both quenched and full 
QCD, we find an equal value of v = 0.75. 

With regard to the global HMC Monte Carlo 
decision, it is important to study the impact of 



find 6{AS) to settle on the lO'^ level at r < 10"'^, 
the saturation being indicative for the impact of 
the 32 bit accuracy to that very precision in S. As 
the quantity S{AS) is a measure for the size of the 
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Lyapunov amplitude we conclude that the dan- 
gerous instability appears to be rather marginal 
under the conditions of our HMC process, with its 
stochastic pivot amplitude ASl Note that during 
the actual simulation we used r = 10^^. 

Alternatively, one might argue that the size 
of the expectation value of exp(— AS*) — 1 is a 
measure for the effective violation of reversibility 
along the time history of an HMC run. We found 
(exp(-A5) - 1) = 0.009 ± 0.022 on a contiguous 
sample of 1000 configurations for k = 0.1575! 

5. RESULTS 

Table 1 lists the number of trajectories that 
have been generated in about 100 days cpu-time 
on the QH2. The second row gives the numbers of 
configurations that we consider to be thermalized. 



Table 1 

Number of trajectories generated so far. The last 
row shows the number of thermalized configura- 
tions with anti-periodic boundary conditions. 





total # of 


# of thermalized 




trajectories 


trajectories 


0.156 


564 


160 


0.1565 








0.157 


2478 


1800 


0.1575 


1795 


1400 



In Table 2 we give the HMC run parameters 
that we chose for so far three Ksea values at f3 = 
5.6. 

5.1. Autocorrelation 

Knowledge of the autocorrelation is instrumen- 
tal for assessing the statistical relevance of a given 
sample of trajectories. As we archive all config- 
urations we are in the position to compute the 
autocorrelation of any operator. During produc- 
tion we have monitored in situ the plaquette and 
the pion propagator (on one time slice) for our 
two smallest quark masses. 



Table 2 

Run parameters of HMC simulation. In the sec- 
ond half of the runs we have switched over to 
variable N^d uniformly distributed with a width 
of 20. 

Nmd dt r acc/% ( # it.) 
0.156 100 0.01 10-^ 85 % 100 
0.157 100 0.01 10-** 82 % 166 
0.1575 100 0.01 10-8 76 % 299 



The exponential autocorrelation time, T^xp, is 
defined as the fiattest slope in the logarithm of 
the autocorrelation function and is related to the 
length of the thermalization phase[18]. In princi- 
ple, it is universal, i. e. independent on the partic- 
ular underlying operator. In practice one cannot 
guarantee the 'slowest' mode to be really the slow- 
est. In this paper we assume the thermalization 
to be 5 X T,,xp, i. c. we require the influence of the 
initial configuration onto the "first" thermalized 
configuration to be of order 0(exp(— 5)). 

The statistical error of an operator O is char- 
acterized by the (operator dependent) integrated 
autocorrelation time. Fortunately, slow modes go 
along with small amplitudes, and therefore we can 
determine a reasonable estimate of Tint from the 
behaviour of the leading modes only. 

A sensible measurement of the autocorrelation 
function requires a sufficient lengtii of tiie Markov 
chain. This fact prevented a reliable measure- 
ment of HMC generated configurations on large 
lattices up to now. In Fig. 6, we have plotted 
the autocorrelation function of the plaquette P. 
In order to demonstrate the impact of the sam- 
pling on the correlation measurement we display 
C{t) vs. t as a function of the sample size. We 
find that there exists a threshold in sample size 
(at about 1200 trajectories!) above which a slow 
mode can be identified in our time series analysis: 
from thereon the slope of \og{C{t)) is unravelled 
from the noise^. 

Table 3 presents the values for the autocore- 

^For the estimation of tlie error of C{t) we took into ac- 
count the integrated autocorrelation time of the autocor- 
relation function itself! 
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Figure 6. Autocorrelation function C{t). The 
curves correspond to various lengths of the con- 
tiguous sample. A rehable determination of the 
slowest dynamical mode can be performed with a 
sample of 1400 trajectories, whore C{t) follows a 
single exponential in the range of 20 and 95 time 
units. 



1) for the plaquette P and the pion propagator 
at T = 16. We performed uncorrelated fits to 
log(C(i)) to extract Texp and estimated the in- 
tegrated autocorrelation time Tint according to 
Tint = limt^oo -4(1 - exp(-i/T)). 

It is gratifying to find r well below the value of 
100 trajectories! On the basis of this result, it ap- 
pears realistic, with the present means, to achieve 
a final sample of 100 independent configurations 
at each of the 4 k values quoted in Table 4. 



Table 3 

Exponential and integrated autocorrelation 

times of plaquette and pion at k = 0.1570 and 
0.1575. 



^sea 


0.1570 


0.1575 


Tint of plaquette 


12.3(5) 


14.8(8) 


Tint of pion propagator 


6.9(8) 


9.2(9) 


Texp of plaquette 


31(17) 


39(31) 


Tgxp of pion propagator 


34(15) 


57(26) 



Table 4 

Estimate of the number of configurations of our 
mass trajectory that can be generated within 140 
TFlopxhrs cpu time on the QH2 to end up with 
« 100 independent configurations at each k value. 



Ksea 


total # of trajectories 


0.1560 


4000 


0.1565 


5000 


0.1570 


6000 


0.1575 


8000 



lation times we computed so far. Our determi- 
nations of the exponential and integrated auto- 
correlation times are based on our thermalized 
samples at k = 0.1570 and 0.1575 (see Table 



5.2. Static Potential 

We^ compute the static (spin independent) po- 
tential V^(-R) from the path-ordered products of 

^This part of the work is done in collaboration with 
A. Wachter. 
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link variables around space-time rectangles per- 
forming measurements on configurations sepa- 
rated by 50 units of time - this corresponds to 
more than twice the integrated plaquette auto- 
correlation time (see section 5.1). 

In order to enhance 

the ground-state signal, we use a local smearing 
procedure on the spatial Hnks as depicted below. 

— -N a 



Here, N is a projection operator onto SU(3); the 
smearing parameter a is set to a = 2. 100 smear- 
ing iterations are performed giving significant im- 
provement of the signal for all distances. Overlaps 
are bigger than 90 % for most R. 

Off-axis measurements are included for im- 
proved spatial resolution; this also allows an in- 
vestigation of rotational invariance restoration. 
We perform measurements at 36 different values 
of J?. 

The potential is defined through local masses; 
for fixed time T : V^iR) = log ^^Jg^. We 
find good plateaus for T > 3. The data are fitted 
to the following four-parameter ansatz : 



V{R) =Vo + kR 



^+9(^-1^])^ (9) 



where [-^] is a lattice propagator for the one- 
gluon exchange (see e.g [20]). Table 5 shows the 
results for three of our sea-quark masses with es- 
timates of the lattice spacings aqq obtained from 
k = a^a with ^ = 440 MeV. These results are 
preliminary; errors are purely statistical and ob- 
tained from a bootstrap analysis with a sample 
size of 100. The data are shown in Fig. 7. In 



Table 5 

Estimates of the lattice spacings obtained from 
the string tension (preliminary). 



^sea 


# of configs 


k 


a-'[GeV] 


0.156 


6 


0.0608(44) 


1.8(2) 


0.157 


29 


0.0437(18) 


2.1(1) 


0.1575 


12 


0.0340(18) 


2.4(1) 
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Figure 7. The static potential (in lattice units) 
for the three Ksea- values (preliminary). 



the future, with substantially improved statistics, 
we will compare these findings to lattice spacings 
obtained from the inter-quark force radius a la 
Sommer[21]. 



5.3. Light Spectrum 

As part of an ongoing study of the light-hadron 
spectrum we present preliminary results for the 
lattice spacing obtained from the mass of the rho. 

This simulation is being performed with a set of 
10 combinations of 4 valence quarks (ki,K2) on 
the three different sea-quark configuration sets. 
As above, we use decorrelated trajectories sepa- 
rated by more than twice the integrated pion au- 
tocorrelation time. Wc calculate smcared-local 
and smeared-smeared combinations of correla- 
tors, performing 50 smearing-iterations and using 
'Gaussian' Wuppertal smearing[19]. 

So far, we have analysed 4 configurations at 
K = 0.157 so that correlated fits are not yet 
possible over a large t-range. Using only the 
smeared-smeared data, we perform linear chiral 
extrapolations in to find Kcrit = 0.1596 and 
also in nip to find a preliminary lattice spacing 
a„i = 2.48(20)Gey. These fits are shown in fig- 
ure 8. 
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Figure 8. Linear extrapolations of ml and nip for 
a set of 10 combinations of 4 bare quark masses 
at fixed sea-quark Ksea = 0.157. 



6. Conclusions 

The main conclusion at this stage of the work 
is positive: the autocorrelation time in the HMC 
series is well below 100, enabling us to achieve — 
in 140 TFlopsx hours — a decent statistics on the 
target sample of 100 independent field configura- 
tions. 
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